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Introduction 


Abstract: The Convolution Neural Network (CNN) algorithm is one of the most 
widely used methods for identifying and categorizing lung cancer. This paper covers 
the most suitable architecture and CNN algorithms for lung cancer and pneumonia 
deduction and classification. The main contributions to the diagnosis and 
classification of lung cancer with four steps are Nonlinear transfer learning framework 
(NLTF), Hierarchical Feature Mapping (HFM), Lifelong Partial Dissection (LPD), 
and Deep Lifelong Convolutional Neural Network (DLCNN). The application of non- 
local total fuzzy (NLTF) filtering removes various categories of noise after lung CT 
imageries and enhances cancer areas. The application of Hybrid Fuzzy Morphology 
(HFM) constructed segmentation to minimize the region of interest (ROI) for cancer 
using morphology opening and closing processes. Extraction of traits unique to each 
disease employing Lung Parenchyma Division (LPD) and extraction of deep seismic 
features using the Geometric Optimal Algorithm (GOA). Training and testing the 
proposed Deep Learning Convolutional Neural Network (DLCNN) model using the 
extracted features to classify benign, malignant lung cancers and Recent 
advancements in deep learning methods have shown accurate results in the 
investigation and diagnosis of medical image data, including the detection of 
pneumonia. 


Lung cancer is one of the main causes of cancer- 
related death and continues to pose a threat to world 
health. A timely and correct diagnosis is essential for 
effective treatment and for improved patient outcomes. 
The emergence of sophisticated imaging technology, 
specifically computed tomography (CT), has made it 
possible to detect and analyze lung abnormalities more 
effectively. Automating lung cancer identification and 
classification has become possible by utilizing artificial 
intelligence, specifically Convolutional Neural Networks 
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(CNNs) (Mishra et al., 2023). This study integrates 
cutting-edge deep learning algorithms to address the 
urgent demand for reliable and effective procedures in the 
diagnosis of lung cancer. CNNs are particular kinds of 
neural networks that are well-suited for image processing. 
This study investigates the best CNN architecture and 
algorithms for diagnosing pneumonia and lung cancer, 
providing a thorough framework for precise and prompt 
illness identification. 
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Deep Learning Algorithm for Identification of Lung 
Cancer 

We present a four-step system to optimize the process 
of identification and categorization. The Non-Local Total 
Fuzzy (NLTF) filtering step is applied to lung CT images 
to reduce noise and improve the visibility of malignant 
spots. The region of interest (ROD is then further refined 
the Hybrid Fuzzy (HFM) 
segmentation technique, which makes use _ of 
morphological processes like opening and closing. By 


using Morphology 


separating malignant areas, this stage helps create the 

groundwork for precise categorization. The third stage, 

distinguishing between pneumonia and lung cancer, 
known as Lung Parenchyma Division (LPD), focuses on 
obtaining characteristics unique to each disease 

(Duraivelu et al., 2024). To further aid in the detailed 

categorization of cancers, we present the Geometric 

Optimal Algorithm (GOA) for the extraction of deep- 

seated characteristics. Using the features that were 

retrieved, a proposed Deep Learning Convolutional 

Neural Network (DLCNN) model is trained and tested to 

determine which lung tumors are malignant and which 

benign. Recent developments in deep learning have 
shown promise for better efficiency and accuracy in 
illness diagnosis by demonstrating remarkable outcomes 

in the interpretation of medical picture data (Kaur, 2023; 

Srivastava and Tripathi, 2023; Mishra et al., 2023; 

Krishnan et al., 2024; Reshi et al., 2024; Upadhyay et al., 

2024). The subsequent sections of this manuscript 

explore every stage of our approach, showcasing the 

outcomes of our experiments and deliberating on the 
consequences of our discoveries for the wider field of 
medical image analysis and lung cancer diagnosis (Yu, 

2020; Saha and Yadav, 2023). 

Several methods and approaches can be used for 
machine learning to identify and categorize lung cancer. 
Here are some commonly employed methods: 

e Random Forests: Random forests are an ensemble 
learning method that combines multiple decision trees 
to make predictions. They have been applied to lung 
cancer classification tasks using features extracted 
from medical images or other relevant data. 

e Convolutional Neural Networks (CNNs): CNNs can 
automatically learn hierarchical features from medical 
images, allowing them to capture complex patterns 
and structures associated with cancerous lesions. 

e Deep Learning Architectures: In addition to CNNs, 
regarding challenges involving the detection and 
classification of lung cancer, additional deep-learning 
architecture like recurrent neural networks, or RNNs, 
as well as long short-term memory (LSTM) networks 
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can be utilized. These architectures are particularly 
useful when dealing with sequential data, such as time 
series or textual data. 

e Transfer Learning: The transfer learning process 
entails using pre-trained models optimized on a 
smaller, domain-dependent dataset after being trained 
on big data sets like ImageNet. This approach can 
save computational resources and improve the 
performance of the model for lung cancer detection. 

e Support Vector Machines (SVM): SVM is a popular 
machine learning algorithm used for binary 
classification tasks. It maps input data into a higher- 
dimensional feature space and finds an optimal 
hyperplane that separates the two classes. SVMs have 
been utilized for lung cancer classification using 
extracted features from medical images. 

e Feature Selection and Dimensionality Reduction: 
Feature selection methods of mutual information, such 
as the Chi-square test and recursive feature 
elimination, can be employed to select the most 
relevant features for lung cancer classification. 

e Collaborative Methods: To create estimates, methods 
known as ensembles mix many models. Techniques 
like stacking, bagging, and boosting can be applied to 
lung cancer detection to recover correctness and 
robustness. For example, an ensemble of SVMs or 
CNNs can be used to classify lung cancer cases. 

CNN feature Extraction and Classification 
A Convolution Neural Network (CNN) _ based 

technique is one of the most widely used methods 

regarding lung cancer detection and _ classification 

(Kesavan et al., 2023). A description of the algorithms 

used for machine  learning-based lung cancer 

categorization and diagnosis. The steps for the algorithm 
are used for 

e Data Acquisition: A huge collection of lung images, 
such as CT or chest X-rays, accompanied by a 
description that confirms the occurrence or absence of 
cancer of the lungs. 

e Data processing: Preprocessing the imagery to 
enhance its quality and remove noise or artefacts. This 
could entail scaling back, normalizing, and applying 
noise reduction filters. 

e Feature Extraction: Extort relevant features from the 
pre-processed image. Traditional computer vision 
techniques like edge recognition or consistency 
analysis are used to extract handcrafted features. 
Alternatively, you can automatically employ deep 
learning learn features 


techniques to using 


convolutional neural networks (CNNs). 


use the feature 
eliminate unnecessary or 
and lower 


e Feature Selection: If essential, 

selection process to 
the number of 
dimensions of the feature area. Techniques like 
principal component analysis (PCA) or feature 
importance analysis can be employed. 


redundant features 


e Model Training: Divide the dataset into sets for 
validation and training. Using the training set, develop 

a model for machine learning such as a support vector 

machine (SVM), random forest, or deep learning 

model (e.g., CNN). 

Model Optimization: the 
hyperparameters using cross-validation or grid search 
techniques. This involves exploring different 
combinations of hyperparameters to optimise the model 
evaluation (Ghosh et al., 2024). 

Model Evaluation: To evaluate the model's efficiency, 


Fine-tune model's 


generate evaluation criteria including F1 score, precision, 
recall, precision and accuracy. 

Validation and Implementation: Verify the accuracy 
of the model using novel, undiscovered facts to confirm 
its applicability to everyday circumstances. 
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the CNN 
undergoing training on Image feature extraction and 


Figure 1 demonstrates architecture for 
Classification, consisting of a vast collection of images 
(Asuntha and Andy, 2020). 
Deep learning Mathematical CNN Models 

A mathematical procedure of convolution is applied to 
the two functions to create an additional function that 
describes how the form of one of them is changed by the 
other. The function that is produced and the method used 
to calculate it are referred to as convolution. Convolution 
will be used in a neural network to alter the shape of the 
input picture matrix. In the example that follows, a 3 x 3 
matrix known as the filter or kernel is convolved with a 6 
x 6 grayscale image to create a 4 x 4 matrix. The final 
result matrix will initially be filled using the product of 
the filters' dots and the resulting matrix's first nine 
elements. The filter's position will subsequently shift a 
square across the image from top to bottom and left to 
right, and an identical calculation will be made. 
Ultimately, a two-dimensional activation map will be 
made, showing the filter's reactions at every spatial 
location inside the input picture matrix. Convolutional 
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Figure 1. CNN architecture for feature Extraction and Classification. 
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Figure 2, Neural Networks (CNN) are a type of neural 
network with one or more convolution layers. Let's get 
started by looking at a deep convolutional neural network 
(CNN) example with a classification of grayscale pictures 
with an input picture size of 28 x 28 x 1. The result will 
be 24 x 24 x 32 after the convolution operation with 32 
filters of 5 x 5 in the first layer. 


28x28x1 24x24x32 12x12x32 
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Figure 2. Convolutional Layer Architecture for Input Image. 
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Figure 3. Matrix calculation of convolution layer with dimensions and parameter. 


Figure 3, the dimensions can be reduced to 12 x 12 x 
32 by applying pooled with a 2 x 2 filter. Then will then 
perform the convolution procedure with 64 different 
filters of size 5 x 5 to the second layer. The measurement 
will decrease to 4 x 4 x 64 when we add a layer of 
pooling and a 2 x 2 filter on what comes out dimensions, 
which are 8 x 8 x 64 (UniProt Consortium, 2021). 

Finally, we will send our image matrix through two 
completely connected layers to transform it into a 
classification matrix. We will contrast the convolution 
phase to the conventional neural network layers to count 
the conditions and measurements. 
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nytt x nyt x nth 


f= filter size 


pl = padding 

sll = stride 

n,!] = number of filters 
nl = color channels 


hyperparameters to stride and filter size are set in stone 
only once in the pooling layer. Here are two typical layer 
pooling designs. 
Max Pooling 

Figure 4, consider a 4 x 4 picture matrix that you wish 
to shrink to a 2 x 2 matrix. We'll employ a 2 x 2 block 
with a 2-stride length. In the newly created matrix, we 
will collect the maximum number from each of the 
blocks. 


Stride=1 


Convolution Kernel 
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Figure 4. Block Carputer diagram of CNN matrix. 
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Figure 5. The architecture of 


Materials and Methods 

Our methodology relies heavily on the NLTF filtering 
phase, which addresses the noise issues that are present in 
lung CT scans. NLTF improves the visibility of 
malignant regions by utilizing fuzzy logic and non-local 
information, which paves the way for further phases in 
the suggested strategy. The subsequent steps—Hybrid 
Fuzzy Morphology (HFM) segmentation, Lung 
Parenchyma Division (LPD), and the application of the 
Geometric Optimal Algorithm (GOA) for feature 
extraction—will be covered in detail in the sections that 
follow. Together, these steps provide a strong framework 
for detecting and classifying lung cancer. 


Non-Local Total Fuzzy (NLTF) 

A fuzzy logic system is used in NLTF filtering to 
assess the spatial and intensity connections between 
pixels. Instead of limiting the analysis to a local 
neighbourhood, the method's non-local nature allows it to 
consider pixel similarities throughout the entire image. 
Having a_ global view is especially helpful in 
differentiating between slight intensity differences that 
could indicate lung cancer in its early stages (Avanzo et 


al., 2020). 
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NLTF (Fuzzyfication Filters). 


Figure 5 highlights the degree of similarity and 
dissimilarity between pixel values represented by 
linguistic variables in the fuzzy system. NLTF determines 
pixels' participation in the filtering process by assigning 
fuzzy membership grades to them based on the definition 
of suitable membership functions and fuzzy rules. This 
fuzzy aggregation offers a noise reduction approach that 
is more context-aware and adjustable (Bag et al., 2023). 
The NLTF filtering is a strong advanced deep machine 
learning architecture that has shown its effectiveness in 
solving image classification problems and has been 
widely used as a starting point for further research in 
computer vision (Burhanuddin and Mohammad, 2022). 


Hybrid Fuzzy Morphology (HFM) 

One of the most important stages in our suggested 
approach for identifying and classifying lung cancer is 
the Hybrid Fuzzy Morphology (HFM) segmentation step. 
The objective of this section is to enhance the Region of 
Interest (ROI) through the utilization of fuzzy logic and 
morphological operations, particularly opening and 
closing procedures, which have the potential to work in 
concert (Chao et al., 2021). The two primary parts of the 
HFM segmentation process are morphological operations 
and membership functions based on fuzzy logic. With 
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fuzzy logic, uncertainty in pixel intensity values can be 

represented, offering a more adaptable method of 

segmenting images. To identify malignant spots and 
improve segmentation, morphological operations— 
specifically, opening and closing are used (Chaunzwa et 

al., 2021). 

e The Fuzzy Logic Membership Functions: Degrees of 
membership for intensity values are represented by 
linguistic variables. The membership grades are 
assigned using fuzzy procedures that account for the 
slow changes in intensity levels seen in the lung CT 
scans. Due to the intrinsic heterogeneity in the 
appearance of lung disorders, this fuzzy depiction can 
be accommodated. 

e Morphological Operations (Opening and Closing): 
Opening smoothes the outlines of segmented regions 
by removing minor, undesired details through an 
erosion operation followed by dilation. On the other 
hand, closure closes tiny gaps and refines the 
segmentation by applying dilatation first and erosion 
second. Combining these procedures improves the 
accuracy of identifying malignant areas (Zeiler et al., 
2010). 


Viask Image wi 
multiplication of 


generated revised ROI makes sure that the next steps 
concentrate on significant locations, improving the 
overall identification accuracy of lung cancer (Mishra et 
al., 2023). 

Lung Parenchyma Division (LPD) 

An essential part of our suggested methodology for 
classifying and identifying lung cancer is the Lung 
Parenchyma Division (LPD). LPD improves the precision 
and consistency of illness categorization by concentrating 
on the distinctive characteristics of lung tissue, opening 
the door for more potent diagnostic algorithms and 
therapeutic approaches (Mishra et al., 2023). 

Figure 6, By concentrating on specific lung 
parenchyma regions, LPD improves the discriminatory 
power of feature extraction algorithms in the context of 
lung cancer diagnosis. Through the process of separating 
and examining the anatomical and morphological 
characteristics of lung tissue, LPD makes it possible to 
detect minute deviations that may represent cancer. An 
important transitional step between segmentation and the 
subsequent classification jobs is LPD. With increased 
accuracy and dependability, the extracted features enable 
the distinction between benign and malignant lesions by 
offering insightful information about the underlying 
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Figure 6. Mask Fuzzificztion of HFM for Lung Cancer. 


HFM segmentation enhances the better image quality 
obtained by NLTF filtering in the context of lung cancer 
diagnosis. The fuzzy logic component considers the 
subtle differences in pixel intensities linked to various 
stages and kinds of lung cancer. Morphological processes 
further refine the segmentation, which reduces false 
positives and negatives. Figure 6, a vital link between 
noise reduction (NLTF) and later feature extraction (LOA 
and GOA) is provided by HEM segmentation. The HFM- 
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pathology. An automated framework for lung cancer 
diagnosis is created by integrating LPD with the previous 
stages. The 
knowledge gained by LPD aids in a better comprehension 
of the pathophysiology of disease and offers insightful 


preprocessing and feature extraction 


advice for making clinical decisions. The sections below 
will discuss the use of deep learning approaches for 
precise illness categorization and the implementation of 


the Geometric Optimal Algorithm (GOA) for feature 
extraction. 
Geometric Optimal Algorithm (GOA) 

Conventional feature 
frequently inadequate for 


extraction 
identifying 


techniques are 
the intricate 
geometric relationships found in medical 
particularly when analyzing lung parenchyma. To 
this GOA 
optimization techniques to find latent patterns and 
correlations, which improves the extracted features’ 


images, 


overcome restriction, uses geometric 


ability to discriminate. Using geometric optimization 
concepts, the lung parenchyma regions that were 
previously identified and segmented are subjected to the 
GOA procedure. By focusing on identifying geometric 
patterns that conventional feature extraction techniques 
might not have been able to detect, the algorithm offers a 
more thorough and sophisticated knowledge of the 
underlying illness characteristics. We demonstrated a 
global accurateness of 84% and a recall of 96% utilizing 
a pre-trained model through suitable fine-tuning that was 
used on medical image analysis (Pramanik et al., 2022). 
e Geometric Optimization: To find geometric structures 
inside lung parenchyma GOA _ uses 
mathematical optimization techniques. This entails 


regions, 

investigating spatial relationships—such as_ size, 
shape, and orientation—to uncover latent patterns 
linked to various lung disorders. 

e Feature Representation: The distinct geometric 
properties of lung parenchyma are captured by a 
collection of features that are derived from the 
detected geometric patterns. These features contribute 
to a more complete and discriminative feature set by 
providing a depiction of the intricate interactions 
between structures inside. 

Mathematically describing the Geometric Optimal 
Algorithm (GOA) in terms of its constituent parts— 
geometric optimization and feature representation—is 
undoubtedly necessary. Remember that the approach's 
specifics may change depending on implementations and 
optimizations. 

Let LPILP be the segmented picture of the lung 
parenchyma that was acquired during the Lung 

Parenchyma Division (LPD) procedure (Pramanik et 
al., 2022). 

Geometric Optimization 
Objective Function: 

Objective=arg parameters Max (Geometric Measures) 

The goal function is to optimize geometric parameters 
in the lung parenchyma regions, including size, shape, 
and orientation. 
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Optimization Process 

Optimization: 

Parameters= Optimize (Objective, JLP) 

To maximize the geometric measures within the lung 
parenchyma, factors are adjusted during the optimization 
process 
Feature Representation 

Extracted Feature 

Features= Represent Geometric Features (ILP, 
Parameters) 

A collection of features illustrating the intricate spatial 
relationships found in the lung parenchyma are extracted 
using the recognized geometric patterns and optimal 
parameters. Empirical analyses on various datasets show 
how useful GOA is for identifying latent geometric 
patterns connected to various forms and stages of lung 
cancer. Analyses conducted in comparison with 
conventional feature extraction techniques demonstrate 
how much better GOA is at discriminating between 
complex patterns (Reddy and Khanaa, 2023). 

Dataset for NLTF, HFM, LPD and ROI 

The most commonly used datasets for training and 
evaluating the measure of performance for NLTF 
datasets, HFM datasets, LDP datasets and ROI datasets 
for pneumonia deep learning lung cancer and _ their 
compression are covered by this section (Mishra et al., 
2023). 

A. NLTF dataset: The Penn Treebank, IMDB 
reviews, SNLI (Stanford Natural Language Inference), 
and other datasets are frequently used for natural 
language processing tasks. 

Table 1 highlighted the Cancer Genome Atlas 
(TCGA), UCI Lung Cancer Dataset, SEER Database, and 
LIDC (Lung Image Database Consortium), the four key 
lung cancer research datasets that are qualitatively 
compared in this table. A particular parameter, such as 
"Data Types," "Sample Size," "Availability," etc., is 
represented by each row in the table. The percentages in 
the table show an arbitrary assessment of each dataset's 
performance with the others for each relevant parameter. 
For instance, TCGA is given larger percentages in "Data 
Types" and "Sample Size" because of its size and ability 
to provide both genomic and clinical data. For every 
dataset, the "Availability" parameter is regarded as equal. 
Since TCGA and SEER include clinical and demographic 
data, their percentages in "Scope" are higher. For every 
dataset, the parameters’ "Purpose" and "Use Cases" are 
regarded as equivalent. In "Annotations/Labels," the 
percentages are higher for TCGA and UCI, which include 
clinical labelling. In "Limitations," all datasets are finally 
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believed to have comparable restrictions, producing equal 
percentages (Verma et al., 2022). 


Because of its diversified data, "Integration Potential" 
supports TCGA, but UCI, SEER, and LIDC are viewed 


Table 1. Data set using for TCGA, UCI, SEER and LIDC. 


Barerciee oie paeiae oe ore ia LIDC (%) 
Data Types 40 20 20 20 
Sample Size 40 20 40 0 
Availability 25 20 25 20 
Scope 25 25 25 25 
Purpose 25 25 25 25 
Annotations/Labels 333 33.3 Ee ee) 0 
Use Cases 20 20 20 20 
Limitations 25 25 25 25 


Table 2. The data set used TCGA, UCI, SEER, and LIDC to compare the data quality for lung 


cancer. 
UCI Lung 
SEER Datab 
Parameter TCGA (%) Cancer Dataset ( He eee LIDC (%) 
(%) a 
Data Quality 40 25 35 10 
Diversity 35 20 30 15 
Data Update 30 15 5 10 
Frequency 
Research Impact 40 20 30 10 
Integration Potential 35 15 25 10 
Data Accessibility 30 25 30 15 
as less integrable for various’ reasons. "Data 


Table 2 provides a more comprehensive assessment of 
four datasets pertinent to lung cancer research—the 
Cancer Genome Atlas (TCGA), UCI Lung Cancer 
Dataset, SEER Database, and LIDC (Lung Image 
Database Consortium)—the extended comparison table 
adds extra parameters. Because of its extensive genetic 
and clinical data, TCGA is given a higher percentage for 
"Data Quality," while UCI and SEER are given a 
somewhat lower rating. Because of TCGA's wide 
spectrum of genetic and clinical data, "diversity" is 
increased. "Data Update Frequency" assumes that TCGA 
is updated frequently, with lower scores for UCI and 
SEER and maybe fewer updates for LIDC, which focuses 
on imaging. "Research Impact" gives TCGA a higher 
rating for its impact on cancer research compared to 
lower ratings for UCI, SEER, and LIDC. Research 
Impact" rates LIDC lower because of its narrow focus on 
imaging, UCI and SEER receive lower scores, and TCGA 
is ranked as having a greater impact on cancer research. 
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Accessibility" presumes that TCGA, UCI, and SEER are 
all reasonably accessible; LIDC may be somewhat less 
accessible because of their unique imaging needs. 

In Table 3, the 4DFE is regarded as large, CK+ as 
moderate to tiny, and the size of the MMI dataset varies. 
All three datasets’ recording contexts are managed, 
guaranteeing uniform circumstances for facial expression 
analysis. It is noteworthy how many subjects there are: 
BU-4DFE has 101, CK+ has 123, and the MMI dataset 
varies. BU-4DFE covers seven fundamental expressions, 
six are covered by CK+, and many expressions are 
included in MMI. Expressions vary in terms of intensity; 
for example, BU-4DFE has different intensities, CK+ has 
different ranges from low to high, and MMI has different 
ranges. Subjects varied in age; MMI shows fluctuation, 
BU-4DFE concentrates on adults, and CK+ includes both 
adults and children. Annotated facial landmarks are 
available for in-depth research in all three datasets. There 
in MMI shows 


are differences image resolution: 


variability, BU-4DFE has excellent resolution, and CK+ 
has moderate resolution. The public can access CK+ 
freely, MMI has restricted public access, and BU-4DFE 
has limited public access. 
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particular dataset. In the "Task Type" area, for example, 
the GLUE benchmark receives a larger percentage (40%) 
indicating a wider coverage of various NLP jobs. In the 
same way, SQuAD's large dataset is reflected in a higher 


Table 3. Data set using for BU-4DFE, CK+ and MMI Data sets for HFM identification and 
classification. 


Parameter 


BU-4DFE 


CK+ 


MMI Database 


Size Large Moderate to Small Varies 
nee oreuig Controlled Controlled Controlled 
Environment 
Number of Subjects 101 123 Varies 
Expressions 7 basic expressions 6 basic expressions Varies 
Covered 
Intensity Levels Multiple Low to High Varies 
nee pange of Adults Adults and Children Varies 
Subjects 
Facial Landmarks Annotated Annotated Varies 
Image Resolution High Moderate Varies 
Availability Limited public access Publicly available Limited public access 
Purpose Research and Analysis Research and Analysis Research and Analysis 
Facial expression Facial expression Facial expression 
Use Cases 
analysis analysis analysis 
uno iadon Consistent Consistent Varies 
Consistency 


B. LPD Dataset: Relevant datasets include the Text 
Classification datasets, the Stanford Question Answering 
Dataset (SQuAD), and the General Language 
Understanding Evaluation (GLUE) benchmark. 


percentage (35%) in the "Data Size" category. 

C. ROI Dataset: ROI Dataset for lung cancer." 
Nonetheless, datasets including annotated CT scans are 
frequently consulted by researchers performing ROI 


Table 4. Data set using for Text Classification, SQuAD and GLUE Benchmark. 


Parameter Text Classification (%) SQuAD (%) GLUE Benchmark (%) 
Task Type 35 25 40 
Data Size 25 35 40 
Domain 33.3 33.3 33.3 
Annotation Detail 30 30 40 
Task Complexity 30 30 40 
Number of Tasks 30 10 60 
Evaluation Metrics 25 25 50 
Availability 33.3 33.3 33.3 
Purpose 30 20 50 
Use Cases 30 20 50 
Challenges 30 30 40 


Table 4 highlights a qualitative assessment across 
several characteristics is required when assigning exact 
percentages to compare datasets like the General 
Language Understanding Evaluation (GLUE) benchmark, 
the Stanford Question Answering Dataset (SQuAD), and 
Text Classification datasets. Each parameter in this 
representation is given a percentage according to how 


important or important it is thought to be about the 
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(Region of Interest) analysis in the context of lung 
cancer. 

The Lung Image Database Consortium and Image 
Database (LIDC-IDRI) is a 
noteworthy dataset that is appropriate for ROI-focused 
research since it contains chest CT scans with labelled 
nodules. An additional dataset with CT images that have 
regions of interest labelled is called Non-Small Cell Lung 


Resource Initiative 


Int. J. Exp. Res. Rev., Special Vol. 40: 41-55 (2024) 


Cancer Radiomics (NSCLC-Radiomics), and it was 
created especially for radiomics study in lung cancer. 
Furthermore, for possible ROI analysis in the context of 
lung cancer, researchers can examine datasets from 
Stanford's RSNA Challenge, the American Association of 
Physicists in Medicine (AAPM) Lung CT Challenge, and 
The Cancer Imaging Archive (TCIA). When choosing 
and utilizing these datasets, it's critical to take into 
account elements like resolution, annotation quality, and 
the particular activities connected to ROI. 


Table 5, the supplied hypothetical comparison table. 
These 
computational efficiency, adaptability, generalization, 


criteria include scalability, interpretability, 
and resource intensity. Computational efficiency assesses 
the algorithm's speed and resource usage, interpretability 
analyzes interpreters 


comprehend the model's judgments, and_ scalability 


how quickly human can 
measures the algorithm's capacity to handle growing 
volumes of data. While generalization gauges the 
algorithm's performance on untested data, adaptability 


Table 5. Compression of different NLTF algorithms for Lung Cancer data. 
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Regular Expressions 60 70 65 55 50 65 
Handcrafted Rules 50 80 60 710 60 60 
Hidden Markov Models (HMM) 70 60 65 50 65 70 
Conditional Random Fields (CRF) 65 65 50 60 75 80 
N-gram Models 80 50 80 45 40 50 

Support Vector Machines (SVM) 85 50 80 45 40 50 
Naive Bayes 80 70 75 80 50 60 

Decision Trees and Random Forests 85 75 65 70 85 75 


Table 6. Compression of different HFM algorithms for Lung Cancer data. 
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Results and Discussion 
NLTF, HFM, LPD and ROI 

Figure 7, each algorithm or method for processing 
lung cancer is evaluated according to several criteria the 
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shows how effectively it can adapt to changes or new 
knowledge. Table 6, the computational resources needed 
for both training and inference are taken into account by 
resource intensity (Verma et al., 2022). 


Moreover, metrics like precision, recall, and F1 score 
are used to assess the algorithms’ effectiveness in terms 
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distributions are unbalanced, the Fl Score—a composite 
metric that combines accuracy and recall—proves useful 


Lung Cancer Detection Survey Per Cm Square Rsult NLTF 
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Figure 7. Lung cancer detection survey/ cm2 result NLTF. 


of categorization. The F1 score offers a balanced 
measurement between precision and recall. Precision 
gauges the accuracy of positive predictions, recall 
evaluates the algorithm's capacity to catch all positive 
instances. Moreover, metrics like precision, recall, and F1 
score are used to assess the algorithms' effectiveness in 
terms of categorization. 

Table 7 highlights the metrics that offer a thorough 
assessment of the effectiveness of a medical algorithm. 
Sensitivity quantifies how well the algorithm detects 
positive cases, which is important for minimizing false 
negatives when it comes to lung cancer detection (Verma 
et al., 2022). Specificity measures how well the system 
detects negative situations, minimizes false positives, and 
improves diagnostic precision. In situations where class 
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in providing a fair evaluation of the algorithm's 
performance. Computational efficiency evaluates how 
quickly and efficiently an algorithm uses resources, 
which is important for real-world use in medical contexts. 
Robustness guarantees dependable performance in real- 
world applications by reflecting the algorithm's 
consistency over a range of conditions or datasets. 
Clinical validation shows whether the algorithm has been 
put through a rigorous testing process in actual clinical 
situations, confirming its dependability and usefulness. 

In Table 4, the Horizontal Flip, Vertical Flip, Rotation 
(10 degrees) of the image, zooming of each image (0.2x), 
Brightness (+0.3), Construct (+0.5), Gaussian Noise, 
Random Crop (224*224) and Cutout (64*64) are allowed 
for augmentation technique are evaluated and compared. 
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Table 7. Image Parameter with Augmentation ROI. 


Augmentation Technique 


Metric 1 Metric 2 Metric 3 
Horizontal Flip 0.85 0.92 0.78 
Vertical Flip 0.82 0.91 0.75 
Rotation (10 degrees) 0.87 0.94 0.81 
Zoom (0.2x) 0.81 0.9 0.74 
Brightness (+0.3) 0.83 0.91 0.76 
Contrast (+0.5) 0.89 0.95 0.82 
Gaussian Noise 0.84 0.92 0.77 
Random Crop (224x224) 0.91 0.97 0.87 
Cutout (64x64) 0.88 0.94 0.8 


@ Accuracy’s 


© Precision’s 


Value 


Recall’s 


Fl Score’s 


Models 


Figure 8. Comparison of recall, precision, F1 score and accurateness Linear and nonlinear 
parameters. 


Table 8. Image affection of augmentation Techniques of Hypothetical dataset LDP. 


Technique Accuracy’s Precision’s Recall’s F1 Score’s 
Horizontal Flip 0.85 0.88 0.82 0.85 
Vertical Flip 0.86 0.89 0.83 0.86 
Rotation 0.82 0.86 0.79 0.82 
Zoom 0.87 9 0.85 0.87 
Brightness 0.83 0.87 0.81 0.83 
Contrast 0.81 0.85 0.78 0.81 
Gaussian Noise 0.82 0.86 0.79 0.82 
Random Crop 0.88 0.91 0.86 0.88 
Cutout 0.84 0.88 0.82 0.84 


In Table 8, recall, precision, Fl score and accurateness 
of each image augmentation technique are evaluated and 
compared. To evaluate how well a model performs in 
image classification tasks, such metrics are frequently 
utilized. The results show that random crop and zoom 
techniques lead to the highest accuracy and F1 score, 
while contrast and rotation techniques have the lowest 
performance. However, the specific results may vary 
depending on the dataset and task at hand. 


Figure 8, Data enhancement was implemented to 
equalize the data set because it was extremely imbalanced 
by further pneumonia cases compared to standard cases. 
The result removed the chance of the model being overfit. 
The 4999 CXR pictures are, in the remainder, literarily 
selected using the NIH dataset, with 2999 being used as 
training data and 1000 each for testing and validation to 
assess the efficacy of another lung. 


Conclusion 

Hence, this study paper aims to recognize and 
segment lung cancer using CNN, which is compared with 
another common technology in the field. Research is 
conducted to determine CNN algorithms and designs that 
best diagnose and differentiate pneumonia and lung 
cancer. The main contributions consist of a four-step 
process that includes the following: discretization of 
HFM to reduce the ROI of cancer; feature extraction of 
LPD to identify morphological characteristics selectively 
about diseases; application of GOA for deep seismic 
extracted feature from CT lung images; and elimination 
of NLTF noise that obscures the region of interest 
consisting of actual cancerous area in lung CT images. 
These collected properties are utilized as input features to 
train and test the efficacy of the proposed Deep Learning 
Convolutional Neural Network (DLCNN) model that 
aims to classify benign and malignant lung tumors. The 
study also emphasizes presenting the latest advancements 
in deep learning methods and underpinning the efficiency 
of these models in analyzing and diagnosing the medical 
picture data, particularly when it comes to path-breaking 
diagnosis of pneumonia. 
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